Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[IBCDPE-835] Revamps GX Report Uploads #130

Merged
merged 7 commits into from
Mar 18, 2024
Merged

Conversation

BWMac
Copy link
Contributor

@BWMac BWMac commented Mar 18, 2024

Problem:

Currently, GreatExpectationsRunner uploads the HTML reports generated by evaluating an expectations suite to a unique Synapse folder for each dataset. The file name is the timestamp for when the report was generated. This poses a couple of problems, including the proliferation of files in those folders (1 per agora-data-tools run) and difficulty in finding reports of interest within the folders.

Solution:

Update agora-data-tools to instead generate report files named after the expectation suite/dataset all into one folder. We then rely on Synapse's versioning to keep track of new versions by uploading the reports with forceVersion=True. This should provide a much better interface for individuals looking to examine reports.

Notes:

  • The old folders are still in Synapse. If we don't care about those past reports I can delete them, otherwise I could index them back into the new file version histories. Let me know.
  • Changes were made to the configuration files. gx_folder is now a "global" configuration which points to the "Great Expectations Reports" folder for prod and testing. Datasets with existing expectation suites now have gx_enabled as part of their individual configuration. gx_enabled is now checked during data processing to determine if GX should be run against a dataset.
  • Tests in test_gx.py and test_process are updated as needed.
  • Documentation is updated to reflect the changes to the configuration files and GX expectation suite contribution process.
  • Example report file in the testing folder.

Copy link

sonarcloud bot commented Mar 18, 2024

Quality Gate Passed Quality Gate passed

Issues
0 New issues
0 Accepted issues

Measures
0 Security Hotspots
No data about Coverage
0.0% Duplication on New Code

See analysis details on SonarCloud

@BWMac BWMac marked this pull request as ready for review March 18, 2024 21:47
Copy link
Contributor

@JessterB JessterB left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm!

Copy link
Contributor

@jaclynbeck-sage jaclynbeck-sage left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good!

@BWMac BWMac merged commit b7326d8 into dev Mar 18, 2024
9 checks passed
@BWMac BWMac deleted the bwmac/IBCDPE-835/gx_report_naming branch March 18, 2024 23:05
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants